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Abstract We survey concepts at the frontier of research connecting artificial, an¬ 
imal and human cognition to computation and information processing—from the 
Turing test to Searle’s Chinese Room argument, from Integrated Information The¬ 
ory to computational and algorithmic complexity. We start by arguing that passing 
the Turing test is a trivial computational problem and that its pragmatic difficulty 
sheds light on the computational nature of the human mind more than it does on 
the challenge of artificial intelligence. We then review our proposed algorithmic 
information-theoretic measures for quantifying and characterizing cognition in var¬ 
ious forms. These are capable of accounting for known biases in human behavior, 
thus vindicating a computational algorithmic view of cognition as first suggested by 
Turing, but this time rooted in the concept of algorithmic probability, which in turn 
is based on computational universality while being independent of computational 
model, and which has the virtue of being predictive and testable as a model theory 
of cognitive behavior. 
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1 The algorithmic model of mind 


Judea Pearl, a leading theorist of causality, believes that every computer scientist is 
in a sense a frustrated psychologist, because computer scientists learn about them¬ 
selves (and about others) by writing computer programs that are emulators of intel¬ 
ligent human behavior UPearl 20121 . As Pearl suggests, computer programs are in 
effect an enhanced version of an important intellectual component of ourselves: they 
perform calculations for us, albeit not always as we ourselves would perform them. 
Pearl claims that by emulating ourselves we externalize a part of our behavior; we 
mirror our inner selves, allowing them to become objects of investigation. We are 
able to monitor the consequences of changing our minds by changing the code of a 
computer program, which is effectively a reflection of our minds. 

Perhaps Alan Turing was the first such frustrated psychologist, attempting to ex¬ 
plain human behavior by means of mechanical processes, first in his seminal work 
on universal computation [ [Turing 1938) , but also in his later landmark paper con¬ 
necting intelligence and computation through an imitation game [Turing 1950|. On 
the one hand. Artificial Intelligence has sought to automatize aspects of behavior 
that would in the past have been considered intelligent, in the spirit of Turing’s 
first paper on universal computation. It has driven the evolution of computer pro¬ 
grams from mere arithmetic calculators to machines capable of playing chess at 
levels beyond human capacity and answering questions at near—and in some do¬ 
mains beyond—human performance, to machines capable of linguistic translation 
and face recognition at human standards. On the other hand, the philosophical dis¬ 
cussion epitomized by Turing’s later paper | Turing 1950) on human and machine 
intelligence prompted an early tendency to conflate intelligence and consciousness, 
generating interesting responses from scholars such as John Searle. Searle’s Chi¬ 
nese Room argument I lSearle 19801 (CRA) is not an objection against Turing’s main 
argument, which has its own virtues despite its many limitations, but a call to dis¬ 
tinguish human consciousness from intelligent behavior in general. Additionally, 
the philosophy of mind has transitioned from materialism to functionalism to com- 
putationalism [jDodig-Crnkovic 2007| , but until very recently little had been done 
by way of formally—conceptually and technically—connecting computation to a 
model of consciousness and cognition. 

Despite concerns about the so-called Integrated Information Theory (IIT)— 
which by no means are devastating or final even if valid [j— there exists now a con¬ 
tending formal theory of consciousness, which may be right or wrong but has the 
virtue of being precise and well-defined, even though it has evolved in a short period 
of time. Integrated Information Theory HOizumi et al. 20141 lays the groundwork 
for an interesting computational and information-theoretic account of the necessary 
conditions for consciousness. It proposes a measure of the integration of information 
between the interacting elements that account for what is essential to consciousness, 
viz. the feeling of an internal experience, and therefore the generation of information 


- See e.g. http : //www. scottaaronson . com/blog/?p=17 99 as accessed on December 
23, 2015. Where Tononi himself provided acceptable, even if not definite, answers. 
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within the system in excess of information received from the external environment 
and independent of what the system retrieves, if anything. 

Moreover, there now exists a formal theory capable of accounting for biases in 
human perception that classical probability theory could only quantify but not ex¬ 
plain or generate a working hypothesis about UGauvrit et al. 2014allGauvrit et al. 2014cl . 
This algorithmic model provides a strong formal connection between cognition and 
computation by means of recursion, which seems to be innate or else developed over 
the course of our lifetimes UGauvrit et al. 2014cl . The theory is also not immune 
to arguments of super-Turing computation [Zenil & Hernandez-Quiroz 2007] , sug¬ 
gesting that while the actual mechanisms of cognition and the mind may be very 
different from the operation of a Turing machine, its computational power may be 
that of the Turing model. But we do not yet know with certainty what conditions 
would be necessary or sufficient to algorithmically account for the same mental bi¬ 
ases with more or less computational power, and this is an interesting direction for 
further investigation. More precisely, we need to ascertain whether theorems such 
as the algorithmic Coding theorem are valid under conditions of super- or subuni¬ 
versality. Here it is irrelevant, however, whether the brain may look like a Turing 
machine, which is a trivialization of the question of its computational power be¬ 
cause nobody would think the brain operates like a Turing machine. 

The connections suggested between computation and life go well beyond cogni¬ 
tion. For example, Sydney Brenner, one of the fathers of molecular biology, argues 
that Turing machines and cells have much in common HBrenner 20121 even if these 
connections were made in a rather top level fashion reminiscence of old attempts to 
trivialize biology. More recently, it has been shown in systematic experiments with 
yeast that evolutionary paths taken by completely different mutations in equiva¬ 
lent environments reach the same evolutionary outcome | Kryazhimskiy et al. 20141, 
which may establish another computational property known as confluence in ab¬ 
stract rewriting systems, also known as the Church-Rosser property (of which the 
Turing machine model is but only one type of rewriting system). This particu¬ 
lar kind of contingency makes evolution more predictable than expected, but it 
also means that finding paths leading to a disease, such as a neurodegenerative 
disease, can be more difficult because of the great diversity of possible causes 
leading to the same undesirable outcome. Strong algorithmic connections between 
animal behavior, molecular biology and Turing computation have been explored 
in IZenil et al. 2012al IZenil & Marshall 20131 . But the connections between com¬ 
putation and cognition can be traced back to the very beginning of the field, which 
will help us lay the groundwork for what we believe is an important contribution 
towards a better understanding of cognition, particularly human cognition, through 
algorithmic lenses. Here we pave the first steps towards revealing a stronger non¬ 
trivial connection between computation (or information processing) on one side and 
cognition in the other side by means of the theory of algorithmic information. 
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1.1 The Turing test is trivial, ergo the mind is algorithmic 

In principle, passing the Turing test is trivially achievable by brute force. This can 
be demonstrated using a simple Chinese Room Argument-type thought experiment. 
Take the number of (comprehensible) sentences in a conversation. This number is 
finite because the set of understandable finite sentences is bounded and is therefore 
finite. Write a lookup table with all possible answers to any possible question. A 
thorough introduction to these ideas is offered in ttMcDermott 20141 . Passing the 
Turing test is then trivially achievable in finite time and space by brute force; it is 
just a combinatorial problem but nobody would suspect the brain operating in this 
way. Lookup tables run in (9(1) computational time, but if a machine is to pass the 
Turing test by brute-force, their size would grow exponentially for sentences that 
only grow linearly, given the wide range of possible answers. Passing the Turing 
test, even for conservative sentence and conversation lengths, would require more 
space than the observable universe. 

One can make the case of certain sentences of some infinite nature that may be 
understood by the human mind. For example, we can build a nested sentence using 
the word “meaning”. “The meaning of meaning” is still a relatively easy sentence 
to grasp, the third level of nesting, however, may already be a challenge, and if it 
is not, then one can nest it n times until the sentence appears beyond human under¬ 
standing. At some point, one can think of a large n for which one can still make 
the case that some human understanding is possible. One can make the case for an 
unbounded, if not infinite n, from which the human mind can still draw some mean¬ 
ing making it impossible for a lookup table implementation to deal with, because 
of the unbounded, ever increasing n. This goes along the lines of Hofstadter self- 
referential loops HHofstadter 200711 . think of the sentence “What is the meaning of 
this sentence” which one can again nest several times, making it a conundrum but 
from which an arbitrary number of “nestings” the human mind may still be able 
to grasp something, even if a false understanding of its meaning, and maybe even 
collapsing on its own making some sort of strange loop where the larger the n the 
less absolute information content in the sentence there is, not only relative to its 
size, hence approaching 0 additional meaning for ever increasing large n, hence fi¬ 
nite meaning out of an asymptotically infinite nested sentence. Understanding these 
sentences in the described way seems to require “true consciousness” of the nature 
of the context in which these sentences are constructed similar to arguments in favor 
of “intuiton” I Penrose 19901 spanning different levels of understanding (inside and 
outside the theory) leading and based upon Godel’s BGodel 19311 type arguments. 

While it is true that not all combinations of words form valid grammatical sen¬ 
tences, passing the Turing test by brute force may also actually require the capacity 
to recognize invalid sentences in order either to avoid them or find suitable answers 
for each of them. This conservative number of combinations also assumes that for 
the same sentence the lookup table produces the same answer, because the same 
sentence will have assigned to it the same index produced by a hash function. This 
amounts to implementing a lookup table with no additional information-such as the 
likelihood that a given word will be next to another-thus effectively reducing the 
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number of combinations. But assuming a raw lookup table with no other algorith¬ 
mic implementation, in order to implement some sort of memory it would need to 
work at the level of conversations and not sentences or even paragraphs. Then the 
questioner can have a reasonable answer to the question "Do you remember what 
my first question was?” as they must in order to be able to pass a well-designed 
Turing test. This means that in order to answer any possible question related to pre¬ 
vious questions the lookup table has to be simply astronomically even larger, and 
not that a lookup table approach cannot pass the Turing test. While the final num¬ 
ber of combinations of possible conversations, even using the most conservative 
numbers, is many orders of magnitude larger than the largest astronomical mag¬ 
nitudes, this is still a finite number. This means that, on the one hand, the Turing 
test is computationally trivially achievable because one only needs to build a large 
enough lookup table to have an answer for each possible sentence in a conversation. 
On the other hand, given both that the Turing test is in practice impossible to pass 
by brute force using a lookup table, and that passing the test is in fact achievable 
by the human mind, it cannot be the case that the mind implements a lookup ta¬ 
ble I Kirk 19931iPerlis 19901 . The respective “additional mechanisms” are the key to 
the cognitive abilities. 

And this is indeed what Searle helped emphasize. Objections to the Turing test 
may be of a metaphysical stripe, or adhere to Searle and introduce resource con¬ 
straints. But this means that either the mind has certain metaphysical properties that 
cannot be represented and reproduced by science, or that the Turing test and there¬ 
fore the computational operation of the mind can only make sense if resources are 
taken into account |Parberry 1997 Dowe and Hajek 1997 Dowe and Hajek 19981. 
Which is to say that the Turing test must be passed using a certain limited amount of 
space and in a certain limited time, and if not, then machine intelligence is unrelated 
to the Turing test (and to computing), and must be understood as a form of rule/data 
compression and decompression. Searle is right in that the brain is unlikely to oper¬ 
ate as a computer program working on a colossal lookup table, and the answer can be 
summarized by Chaitin’s dictum “compression is comprehension” BChaitin 196611 . 
We believe in fact, just as Bennett does IBennett 19981 , that decompression, i.e. the 
calculation to arrive at the decompressed data, is also a key element, but one that 
falls into the algorithmic realm that we are defending. 

One may have the feeling that Searle’s point was related to grounding and seman¬ 
tics versus syntax, and thereby that algorithms are still of syntactic nature, but Searle 
recognizes that he is not claiming anything metaphysical. What we are claiming is 
that it follows from the impossibility of certain computer programs to explain under¬ 
standing of the human mind is that his human understanding must therefore be re¬ 
lated to features of highly algorithmic nature (e.g. compression/decompression) and 
thus that not all computer programs are the same, particularly when resources are 
involved. And this constraint of resources comes from the brain optimization of all 
sorts of cost functions achieved by striving for optimal learning BZenil et al. 2012al 
such as minimizing energy consumption, to mention but one example, and in this 
optimal behavior algorithmic probability IlSolomonoff 19641 must have an essential 
role. 
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In light of the theoretical triviality of passing the Turing test, it has been 
stressed |Parberry 1997 lAaronson 20131 the need to consider the question of re¬ 
sources and therefore of computational complexity. This means that the mind har¬ 
ness mechanisms to compress large amounts of information in efficient ways. An 
interesting connection to integrated information can be established by way of com¬ 
pression. Imagine one is given two files. One is uncompressible and therefore any 
change to the uncompressed file can be simply replicated in the compressed file, 
the files being effectively the same. This is because any given bit is independent of 
every other in both versions, compressed and uncompressed. But reproducing any 
change in a compressible file leading to an effect in an uncompressed file requires 
a cryptic change in the former, because the compression algorithm takes advantage 
of short and long-range regularities, i.e. non-independency of the bits, encoded in 
a region of the compressed file that may be very far afield in the uncompressed 
version. This means that uncompressible files have little integrated information but 
compressible files have a greater degree of integrated information. Similar ideas are 
explored in [Maguire et al. 20141, making a case against integrated information the¬ 
ory by arguing that it does not conform to the intuitive concept of consciousness. For 
what people mean by the use of ‘consciousness’ is that a system cannot be broken 
down HOizumi et al. 20141 : if it could it would not amount to “consciousness”. 

In practice, search engines and the web constitute a sort of lookup table of 
exactly the type attacked by Searle’s CRA. Indeed, the probability of finding a 
webpage containing a permutation of words representing a short sentence is high 
and tends to increase over time, even though the probability grows exponentially 
slowly due to the combinatorial explosion. The web contains about 4.73 billion 
pages (http://www.worldwidewebsize.com/ as estimated and accessed 
on Wednesday, 07 lanuary, 2015) with text mostly written by humans. But that the 
web seen through a search engine is a lookup table of sorts is an argument against 
the idea that the web is a sort of “global mind” and therefore consonant with Searle’s 
argument. Indeed, there is very little evidence that anything in the web is in com¬ 
pressed form (this is slightly different from the Internet in general, where many 
transactions and protocols implement compression in one way or another, e.g. en¬ 
cryption). We are therefore proposing that compression is a necessary condition for 
minds and consciousness, but clearly not a sufficient one (cf. compressed files, such 
as PNG images in a computer hard drive). 

It has been found in field experiments that animals display a strong behav¬ 
ioral bias. For example, in Reznikova’s communication experiments with ants, 
simpler instructions were communicated faster and more effectively than complex 
ones by scout ants searching for patches of food randomly located in a maze. 
The sequences consist of right and left movements encoded in binary (R and L) 
(see http://grahamshawcross.com/2014/07/26/counting-ants/ 


Accessed on Dec 23,2014.) [Reznikova & Ryabko 2011 Reznikova & Ryabko 2012 


Ryabko & Reznikova 2009) that are communicated by the scout ants returning to 


their foraging team in the colony to communicate instructions for reaching the food. 
We have quantified some of these animal behavior experiments confirming the au- 
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thor’s suspicion that algorithmic information theory could account for the biases 
with positive results llZenil et al. 20151 . 

Humans too display biases in the same algorithmic direction, from their mo¬ 
tion trajectories [Peng et al. 2014] to their perception of reality IlChater 199911 . In¬ 
deed, we have shown that cognition, including visual perception and the generation 
of subjective randomness, shows a bias that can be accounted for with the sem¬ 
inal concept of algorithmic probability IGauvrit et al. 2014al IGauvrit et al. 2014bl 
IGauvrit et al. 2014cl Kempe etal. 2015 Mathy et al. 2014| . Using a computer to 
look at human behavior in a novel fashion, specifically by using a reverse Turing 
test where what is assessed is the human mind and an “average” Turing machine 
or computer program implementing any possible compression algorithm, we will 
show that the human mind behaves more like a machine. We will in effect reverse 
the original question Turing posed via his imitation game as to whether machines 
behave like us. 


2 Algorithmic complexity as model of the mind 

Since the emergence of the Bayesian paradigm in cognitive science, researchers 
have expressed the need for a formal account of complexity based on a sound com¬ 
plexity measure. They have also struggled to find a way of giving a formal normative 
account of the probability that a deterministic algorithm produces a given sequence, 
such as the heads-or-tails string “HTHTHTHT”, which intuitively looks like the 
result of a deterministic process even if it has the same probability of occurrence 
as any other sequence of the same length according to classical probability theory, 
which assigns a probability of 1 /2 8 to all sequences of size 8. 


2.1 From bias to Bayes 

Among the diverse areas of cognitive science that have expressed a need for a new 
complexity measure, the most obvious is the field of probabilistic reasoning. The 
famous work of Kahneman, Slovic and Tversky (1982) aimed at understanding how 
people reason and make decisions in the face of uncertain and noisy information 
sources. They showed that humans were prone to many errors about randomness 
and probability. For instance, people tend to claim that the sequence of heads or 
tails “HTTHTHHHTT” is more likely to appear when a coin is tossed than the series 
“HHHHHTTTTT”. 

In the “heuristics and bias” approach advocated by Kahneman and Tversky, these 
“systematic” errors were interpreted as biases inhering in human psychology, or else 
as the result of using faulty heuristics. For instance, it was believed that people tend 
to say that “HHHHHTTTTT” is less random than “HTTHTHHHTT” because they 
are influenced by a so-called representativeness heuristic, according to which a se- 
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quence is more random the better it conforms to prototypical examples of random 
sequences. Human reasoning, it has been argued, works like a faulty computer. Al¬ 
though many papers have been published about these biases, not much is known 
about their causes. 

Another example of a widespread error is the so-called “equiprobability bias” 
I Lecoutre 19921 . a tendency to believe that any random variable should be uniform 
(with equal probability for every possible outcome). In the same vein as the seminal 
investigations by Kahneman et al. BKahneman et al. 19821 . this bias too, viz.the er¬ 
roneous assumption that randomness implies uniformity, has long been interpreted 
as a fundamental flaw of the human mind. 

During the last decades, a paradigm shift has occurred in cognitive science. The 
“new paradigm”—or Bayesian approach—suggests that the human (or animal) mind 
is not a faulty machine, but a probabilistic machine of a certain type. According to 
this understanding of human cognition, we all estimate and constantly revise prob¬ 
abilities of events in the world, taking into account any new pieces of information, 
and more or less following probabilistic (including Bayesian) rules. 

Studies along these lines often try to explain our probabilistic errors in terms of a 
sound intuition about randomness or probability applied in an inappropriate context. 
For instance, a mathematical and psychological reanalysis of the equiprobability 
bias was recently published | Gauvrit & Morsanyi 20141. The mathematical theory 
of randomness, based on algorithmic complexity (or on entropy, as it happens) does 
in fact imply uniformity. Thus, claiming that the intuition that randomness implies 
uniformity is a bias does not fit with mathematical theory. On the other hand, if one 
follows the mathematical theory of randomness, one must admit that a combination 
of random events is, in general, not random anymore. Thus, the equiprobability bias 
(which indeed is a bias, since it yields frequent faulty answers in the probability 
class) is not, we argue, the result of a misconception regarding randomness, but a 
consequence of the incorrect intuition that random events can be combined without 
affecting their property of randomness. 

We now believe that when we have to compare the probability that a fair coin 
produces either “HHHHHTTTTT” or any other 10-item long series, we do not really 
do so. One reason is that the question is unnatural: our brain is built to estimate the 
probabilities of the causes of observed events, not the a priori probability of such 
events. Therefore, say researchers, when we have participants rate the probability 
of the string s =“HHHHHTTTTT” for instance (or any other), they do not actually 
estimate the probability that such a string will appear on tossing a fair coin, which 
we could write as P(s\R) where R stands for “random process”, but the reverse 
probability P(R\s), that is, the probability that the coin is fair (or that the string is 
genuinely random), given that it produced s. Such a probability can be estimated 
using Bayes’ theorem: 


P(R\s) 


P(s\R)P(R) 

P{s\R)P(R)+P(s\D)P(D)’ 


where D stands for “not random” (or deterministic). 
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In this equation, the only problematic element is P(s\D), the probability that an 
undetermined but deterministic algorithmic will produce s. It was long believed that 
no normative measure of this value could be assumed, although some authors had 
the intuition that is was linked to the complexity of s: simple strings are more likely 
to appear as a result of an algorithm than complex ones. 

The algorithmic theory of information actually provides a formal framework for 
this intuition. The algorithmic probability of a string s is the probability that a ran¬ 
domly chosen program running on a Universal (prefix-free) Turing Machine will 
produce s and then halt. It therefore serves as a natural formal definition of P(s\D). 
As we will see below, the algorithmic probability of a string s is inversely linked to 
its (Kolmogorov-Chaitin) algorithmic complexity, defined as the length of the short¬ 
est program that produces s and then halts: simpler strings have a higher algorithmic 
probability. 

One important drawback of algorithmic complexity is that it is not computable. 
However, there now exist methods to approximate the probability, and thus the com¬ 
plexity, of any string, even short ones (see below), giving rise to a renewed interest 
in complexity in the cognitive sciences. 

Using these methods UGauvrit et al. 2014bl . we can compute that, with a prior 
of 0.5, the probability that the string HHHHHTTTTT is random amounts to 0.58, 
whereas the probability that HTTHTHHHTT is random amounts to 0.83, thus con¬ 
firming the common intuition that the latter is “more random” than the former. 


2.2 The Coding theorem method 


One method for assessing Kolmogorov-Chaitin complexity, namely lossless com¬ 
pression, as epitomized by the Lempel-Ziv algorithm, has been long and widely 
used. Such a tool, together with classical Shannon entropy | Wang et al. 2014) , has 
been used recently in neuropsychology to investigate the complexity of EEG or 
fMRl data [Maguire et al. 20141 and I lCasali et al. 20131 . Indeed, the size of a com¬ 
pressed file gives an indication of its algorithmic complexity. The size of a com¬ 
pressed file is, in fact, an upper bound of true algorithmic complexity. On the one 
hand, compression methods have a basic flaw; they can only recognize statistical 
regularities and are therefore implementations of variations of entropic measures, 
only assessing the rate of entropic convergence based on repetitions of strings of 
fixed sliding-window size. If lossless compression algorithms work to approximate 
Kolmogorov complexity, it is only because compression is a sufficient test for non¬ 
randomness, but they clearly fail in the other direction, when it is a matter of ascer¬ 
taining whether something is the result of or is produced by an algorithmic process 
(e.g. the digits of the mathematical constant n). That is, they cannot find structure 
that takes other forms than simple repetition. On the other hand, compression meth¬ 
ods are inappropriate for short strings (of, say, less than a few hundreds symbols). 
For short strings, lossless compression algorithms often yield files that are longer 
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than the strings themselves, hence providing results that are very unstable and diffi¬ 
cult, if not impossible, to interpret. 

In cognitive and behavioral science, researchers usually deal with short strings 
of at most a few tens of symbols, for which compression methods are thus useless. 
This is the reason they have long relied on tailor-made measures instead. 

The Coding theorem method ]Delahaye & Zenil 20 12|ISoler- Toscano et al. 20141 
(CTM) has been specifically designed to address this challenge. By using CTM, re¬ 
searchers have provided values for the “algorithmic complexity for short strings” 
(which we will abbreviate as “ACSS”). ACSS is available freely as an R-package 
(named ACSS), or through an online (www. complexity calculator . com Ac¬ 
cessed on 26 Dec, 2014) complexity calculator IIGauvrit et al. 2014a) . 

At the root of ACSS is the idea that algorithmic probability may be used to capture 
algorithmic complexity. The algorithmic probability of a string s is defined as the 
probability that a universal prefix-free Turing machine U will produce ,v and then 
halt. Formally, 


m{s) = 1/2 ^ 

U(p)=s 

The algorithmic complexity [Kolmogorov 1965 Chaitin 19661 of a string s is 
defined as the length of the shortest program that, running on a universal prefix- 
free iLevin 19741 Turing machine U, will produce s and then halt. Formally, 


K(s) = min{|p| : U(p) = s} 

Ku(s ) and ni/i (.sj both depend on the choice of the Turing machine U. Thus, the 
expression “the algorithmic complexity of s” is, in itself, a shortcut. For long strings, 
this dependency is relatively small. Indeed, the invariance theorem ISolomonoff 19641 
Kolmogorov 1965 IChaitin 19661 states that for any U and U', two universal prefix- 


free Turing machines, there exists a constant c independent of s such that 


|K £/ (i)-K £/ /(s)| <c 

The constant c can be arbitrarily large. If one wishes to approximate the algorith¬ 
mic complexity of short strings, the choice of U is thus determinant. 

To overcome this inconvenience, we can take advantage of a formal link estab¬ 
lished between algorithmic probability and algorithmic complexity. The algorithmic 
Coding theorem liLevin 19741 states that 


Ku(s) = -log 2 (m c/ (s)) + (9(l) 

This theorem can be used to approximate Ky(s) through an estimation of mu(s). 

Instead of choosing a particular arbitrary universal Turing machine and feeding 
it with random programs, Delahaye and Zenil (2012) had the idea (equivalent in a 
formal way) of using a huge sample of Turing machines running on blank tapes. By 
doing so, they built a “natural” (a quasi-lexicographical enumeration) experimental 
distribution approaching m{s), conceived of as an average mu(s). 
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They then defined ACSS(s) as — log 2 (w(s)). ACSS( s) approximates an average 
Ku(s). To validate the method, we studied how ACSS varies under different condi¬ 
tions. It has been found that ACSS as computed with different huge samples of small 
Turing machines remained stable llGauvrit et al. 2014al . Also, several descriptions 
of Turing machines did not alter ACSS | Zenil & Delahaye 2010| . Using cellular au¬ 
tomata instead of Turing machines, they showed that ACSS remained relatively sta¬ 
ble. On a more practical level, ACSS is also validated by experimental results. For 
instance, as we will see below, ACSS is linked to human complexity perception. And 
it has found applications in graph theory and network biology llZenil et al. 20141 . 


2.3 The Block Decomposition Method 

ACSS provides researchers with a user-friendly tool llGauvrit et al. 2014bl for as¬ 
sessing the algorithmic complexity of very short strings. Despite the huge sample 
(billions) of Turing machines used to build m{s ) and K is) using the Coding theorem 
method, two strings of length 12 were never produced, yet ACSS can safely be used 
for strings up to length 12 by assigning the missing two the greatest complexity in 
the set plus 1. 

The same method used with two-dimensional Turing machines produced a sam¬ 
ple of binary patterns on grids, but here again of limited range. Not every 5x5 grid 
is present in the output but one can approximate any square grid n x n by decom¬ 
posing it to 4 x 4 squares for which there is a known estimation approximated by 
a two-dimensional Coding theorem method. The method simply involved running 
Turing machines on lattices rather than tapes, and then all the theory around the 
Coding theorem could be applied and a two-dimensional Coding theorem method 
conceived. 

This idea of decomposing led to filling in the gap between the scope of classical 
lossless compression methods (large strings) and the scope of ACSS (short strings). 
To this end, a new method based on ACSS was developed: the Block decomposition 
method llZenil et al. 20141 . 

The basic idea at the root of the Block decomposition method is to decom¬ 
pose strings for which we have exact approximations of their Kolmogorov com¬ 
plexity into shorter substrings (with possible overlapping). For instance, the string 
“123123456” can be decomposed into 6-symbol subsequences with a 3-symbol 
overlap “123123” and “123456”. Then, the block decomposition method takes ad¬ 
vantage of known information-theoretic properties by penalizing n repetitions that 
can be transmitted by using only log 2 («) bits through the formula: 

CO) =£log 2 (« p )+K(/?) 
p 

where p denotes the different types of substrings, n p the number of occurrences of 
each p, and K {p) the complexity of p as approximated by ACSS. As the formula 
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shows, the Block decomposition method takes into account both the local complex¬ 
ity of the string and the (possibly long distance) repetitions. 

With the Coding theorem method leading to ACSS, the block decomposition 
method and compression algorithms, the whole range of string lengths can now 
be covered with a family of formal methods. Within the field of cognitive and be¬ 
havioral science, these new tools are really relevant. In the next section, we will 
describe three recent areas of cognitive science in which algorithmic complexity 
plays an important role. In these areas ACSS and the Block decomposition method 
have been key in the approximation of the algorithmic complexity of short (2-12) 
and medium (10-100) length strings and arrays. 


3 Cognition and complexity 

An early example of an application to cognition of the Coding theorem and the 
Block decomposition methods was able to quantify the short sequence complexity in 
Reznikova’s ant communication experiments, validating their results llZenil 20131 . 
Indeed, it was found that ants communicate simpler strings (related to instructions 
for getting to food in a maze) in a shorter time, thus establishing an experimental 
connection between animal behavior, algorithmic complexity and time complexity. 

The idea was taken further to establish a relation between information content 
and energy (spent both in foraging and communicating) to establish a fitness land¬ 
scape based on the thresholds between these currencies as reported in l lZenil et al. 2012at . 
As shown in llZenil et al. 2012al . if the environment is too predictable the cost of 
information-processing is very low. In contrast, if the environment is random, the 
cost is at a maximum. The results obtained by these information-processing meth¬ 
ods suggest that organisms with better learning capabilities save more energy and 
therefore have an evolutionary advantage. 

In many ways, animal behavior (including, notably, human behavior) suggests 
that the brain acts as a data compression device. For example, despite our very lim¬ 
ited memory, it is clear we can retain long strings if they have low algorithmic 
complexity (e.g. the sequence 123456... vs the digits of 71, see below). Cognitive 
and behavioral science deals for the most part with small series, barely exceeding 
a few tens of values. For such short sequences, estimating the algorithmic com¬ 
plexity is a challenge. Indeed, until now, behavioral science has largely relied on a 
subjective and intuitive account of complexity (e.g. Reznikova’s ant communication 
experiments). Through the “Coding theorem method”, however, it is now possi¬ 
ble to obtain a reliable approximation of the algorithmic complexity of any string 
length BGauvrit et al. 2014al and the methods have been put to the test and applied 
to validate several intuitions put forward by prominent researchers in cognitive sci¬ 
ence over the last few decades. We will describe some of these intuitions and explain 
how they have been confirmed by experimental studies. 
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Researchers in psychology have identified different types of memory in humans and 
other animals. Two related kinds of memory have been particularly well studied: 
short-term memory and working memory. Short-term memory refers to a kind of 
cognitive resource that allows the storage of information over a few seconds or min¬ 
utes. According to a recent account of short-term memory liBarrouillet et al. 200911 . 
items stored in short-term memory quickly and automatically decay as times passes, 
as a mere result of the passage of time, unless attention is devoted to the reacti¬ 
vation of the to-be-remembered item. Thus, constant reactivation is necessary to 
keep information stored for minutes. This is done by means of an internal lan¬ 
guage loop (one can continuously repeat the to-be-memorized items) or by us¬ 
ing mental imagery (one can produce mental images of the items). On the other 
hand, some researchers believe that the decay observed in memory is entirely 
due to interference by new information llOberauer 20041 . Researchers thus dis¬ 
agree as to the reasons while they all acknowledge that there is indeed a quick 
decay of memory traces. The span of short-term memory is estimated at around 
7 items UMiller 19561 Klingberg et al. 2009). To arrive at this estimation, psycholo¬ 
gists use simple memory span tasks in which participants are required to repeat se¬ 
ries of items of increasing length. For instance, the experimenter says “3, 5” and the 
participants must repeat “3,5”. Then the experimenter says “6, 2, 9” out loud, which 
is repeated by the participant. Short-term memory span is defined as the length of 
the longest series one can recall. Strikingly, this span is barely dependent on the type 
of item to be memorized (letters, digits, words, etc.). 

Two observations, however, shed new light on the concept of short-term mem¬ 
ory. The first is the detrimental effect of cognitive load on short-term memory span. 
For instance, if humans usually recall up to 7 items in a basic situation as de¬ 
scribed above, the number of items correctly recalled drops if an individual must 
perform another task at the same time. For instance, if a subject had to repeat 
“baba” while memorizing the digits, s/he would probably not store as many as 7 
items liBarrouillet et al. 20041 . More demanding interfering tasks (such as checking 
whether equations are true or generating random letters) lead to even lower spans. 
The second observation is that when there is a structure in the to-be-recalled list, 
one can retain more items. For instance, it is unlikely that anyone can memorize, in 
a one-shot experiment, a series such as “3, 5, 4, 8, 2, 1, 1, 9, 4, 5”. However, longer 
series such as “1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1” will be easily memorized and recalled. 

The first observation led to the notion of working memory. Working memory 
is a hypothetical cognitive resource used for both storing and processing informa¬ 
tion. When participants must recall series while performing a dual task, they assign 
part of their working memory to the dual task, which reduces the part of working 
memory allocated to storage, i.e. short-term memory | Baddeley 1992) . 

The second observation led to the notion of chunking. It is now believed that 
humans can divide to-be-recalled lists into simple sub-lists. For instance, when they 
have to store “1, 2, 3, 4, 5, 3, 3, 3, 1, 1, 1” they can build three “chunks”: “1, 2, 
3, 4, 5”, “3, 3, 3” and “1, 1, 1”. As this example illustrates, chunks are not arbi- 
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trary: they are conceived to minimize complexity, and thus act as compression algo¬ 
rithms |Mathy & Feldman 20121. An objective factor showing that chunking does 
occur is that while recalling the above series, people make longer pauses between 
the hypothetical chunks than within them. Taking these new concepts into account, 
it is now believed that the short-term memory span is not roughly 7 items, as estab¬ 
lished using unstructured lists, but more accurately around 4 chunks llCowan 20011 . 

The short-term memory span is thus dependent on the type of list to be mem¬ 
orized. With structured series, one can recall more items, but fewer chunks. This 
apparent contradiction can be overcome by challenging the clear-cut opposition be¬ 
tween short-term memory span and working-memory span. To overcome the limi¬ 
tations of short-term memory, humans can take advantage of the structure present in 
some series. Doing so requires using the “processing” component of working mem¬ 
ory to analyze the data [ Mathy & Feldman 2012| . Thus, even simple tasks where 
subjects must only store and recall series of items do tax the processing part of 
working memory. 

According to this hypothesis, working memory works as a compression algo¬ 
rithm, where part of the resource is allocated to data storage while another part 
is dedicated to the “compression/decompression” program. Recent studies are per¬ 
fectly in line with this idea. Mathy et al. (2014) used a Simon®, a popular 80s game 
in which one must reproduce series of colors (chosen from among a set of 4 colors) 
of increasing length, echoing classical short-term memory tasks. They show that 
the algorithmic complexity of a series is a better predictor of correct recall than the 
length of the string. Moreover, when participants make mistakes, they generate, on 
average, an (incorrect) string of lower algorithmic complexity than the string to be 
remembered. 

All these experimental results suggest that working memory acts as a compres¬ 
sion algorithm. In normal conditions, compression is lossless, but when the items to 
be remembered exceed working memory capacity, it may be that lossy compression 
is used instead. 


3.2 Randomness perception 


Humans share some intuitive and seemingly innate (or at least precocious) con¬ 
cepts concerning mathematical facts, biology, physics and naive psychology. This 
core knowledge | Spelke 2007| is found in babies as young as a few months. In the 
field of naive mathematics, for instance, it has been shown that people share some 
basic knowledge or intuitions about numbers and geometry. Humans and other an¬ 
imals both manifest the ability to discriminate quantities based on an “Approxi¬ 
mate Number Sense”, thanks to which one can immediately feel that 3 objects are 
more than 2, or 12 than 6 IDehaene 20111 . Because this number sense is only ap¬ 
proximate, it doesn’t allow us to discriminate 12 from 15, for instance. There are 
many indications that we do have an innate sense of quantity. For instance, both 
children and adults are able to “count” quantities not greater than 4 almost im- 
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mediately, whereas larger quantities require a counting strategy. When faced with 
1, 2 or 3 objects, we immediately perceive how many there are without count¬ 
ing. This phenomenon is known as “subitizing” UMandler and Shebo 19821 . Babies 
are young as six months IXu et al 2005 1. apes | Boysen & Hallberg 2000| and even 
birds [ Pepperberg 2006) are able to perform basic comparisons between small num¬ 
bers. However, for quantities above 4, we are bound to resort to learned strategies 
like counting. 

The same results have been found in the field of naive geometry. In a study of 
an Amazonian indigenous group, Dehaene et al (2006) found that people without 
any formal mathematical culture naturally reason in terms of points and lines. They 
also perceive symmetry and share intuitive notions of topology, such as connexity 
or simple connexity (holes). 

Recent findings in the psychology of subjective randomness suggest that prob¬ 
abilistic knowledge may well figure on the list of core human knowledges. Teglas 
et al (2011) exposed 1-year old infants to images of jars containing various pro¬ 
portions of colored balls. When a ball is randomly taken from the jar, the duration 
of the infant’s gaze is recorded. Also, one can easily compute the probability of the 
observed event. For instance, if a white ball is chosen, the corresponding probability 
is the proportion of white balls in the jar. It turned out that gazing time is correlated 
to the probability of the observed event: the more probable the event, the shorter 
the gaze. This is interpreted as proof that 1-year old children already have an intu¬ 
itive and approximate probability sense, in the same way they have an approximate 
number sense. 

This probability sense is even more complex and rich than the previous example 
suggests. Xu and Garcia (2008) proved that infants even younger (8 months old) 
could exhibit behaviors showing a basic intuition of Bayesian probability. In their 
experiment, children saw an experimenter taking white or red balls out of an opaque 
jar. The resulting sample could exhibit an excess of red or white, depending on the 
trial. After the sample was removed, the jar was uncovered, revealing either a large 
proportion of red balls or a large proportion of white balls. It was observed that 
children stared longer at the scene if what transpired was less probable according to 
a Bayesian account of sampling. For instance, if the sample had a large majority of 
red balls but the jar, once uncovered, showed an excess of white ones, the infant’s 
gaze would be trained on the scene longer. 

In the same vein, Ma and Xu (2009) presented 9 and 10-month old children with 
samples from ajar containing as many yellow as red balls. However, the sampling 
could be done either by a human or a robot. Ma and Xu show that children expect 
any regularity to come from a human hand. This experiment was meant to trace our 
intuition of intentionality behind any regularity. However, it also shows that humans 
have a precocious intuition of regularity and randomness. 

Adults are able to discriminate more finely between strings according to their 
complexity. For example, using classical randomness perception tasks, Matthews 
(2013) had people decide whether binary strings of length 21 were more likely to 
have been produced by a random process or a deterministic one. Matthews was pri¬ 
marily interested in the contextual effects associated with such a task. However, he 
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also shows that adults usually agree about which strings are more random. A re¬ 
analysis of Mathews’ data BGauvrit et al. 2014b| showed that algorithmic complex¬ 
ity is actually a good predictor of subjective randomness—that is, the probability 
that a human would consider that the string looks random. 

In attempting to uncover the sources of our perception of randomness, Hsu et al 
(2010) have suggested that it could be learned from the world. More precisely, ac¬ 
cording to the authors, our two-dimensional randomness approach IZenil et al. 20151 
could be based on the probability that a given pattern appears in visual natural 
scenes. They presented a series of patterns, which were 4x4 grids with 8 black 
cells, and 8 white cells. Subjects had to decide whether these grids were “random” 
or not. The proportion of subjects answering “random” was used as a measure of 
subjective randomness. The authors also scanned a series of still nature shots. The 
pictures were binarized to black and white using the median as threshold and every 
4x4 grid was extracted from them. From this dataset, a distribution was computed. 
Hsu et al (2010) found that the probability that a grid appears in random photographs 
of natural scenes was correlated to the human estimate: the more frequent the grid, 
the less random it is rated. The authors interpret these results as evidence that we 
learn to perceive randomness through our eyes. An extension of the Hsu et al study 
(2010) confirmed the correlation, and found that both subjective randomness and 
frequency in natural scenes were correlated to the algorithmic complexity of the 
patterns IGauvrit et al. 2014cl . It was found that natural scene statistics explain in 
part how we perceive randomness/complexity. 

In contradistinction to the aforementioned experiments according to which even 
children under one year displayed the ability to detect regularities, these results 
suggest that the perception of randomness is not innate. Our sense of randomness 
could deploy very early in life, based on visual scenes we see and an innate sense of 
statistics, but it evolves over time. 

Our results in BGauvrit et al. 20143 . suggest that the mind is intrinsically wired 
to believe that the world is algorithmic in nature, that what happens in it is 
likely the output of a random computer program and not of a process produc¬ 
ing uniform classical randomness. To know if this is the result of biological de¬ 
sign or a developed “skill” one can look at whether and how people develop this 
view during lifetime. Preliminary results suggest that this algorithmic view or fil¬ 
ter about the world we are equipped with is constructed over time reaching a 
peak of algorithmic randomness production and perception at about age 25 years 
(see http://complexitycalculator.com/hrng/). This means that the 
mind adapts and learns from experience and for some reason it develops the al¬ 
gorithmic perception before changing again. We would not have developed such a 
worldview peaking at reproductive age had it no evolutionary advantage as it seems 
unlikely to be a a neutral feature of the mind about the world, as it affects the way it 
perceives events and therefore how it learns and behaves. And this is not exclusive 
to the human mind but to animal behavior as shown in IZenil et al. 20151 . All this 
evidence points out towards the same direction, that the world is or appears to us 
highly algorithmic in nature, at least transitionally. 
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In different areas of social science and cognitive science, it has been found that regu¬ 
larities may arise from the transmission of cultural items IlSmith & Wonnacott 20101 . 
For instance, in the study of rumors, it has been found that some categories of rumor 
disseminate easily, while others simply disappear. The “memes” that are easily re¬ 
membered, believed and transmitted share some basic properties, such as a balance 
between expected and unexpected elements, as well as simplicity ttBronner 2010M . 
Too-complex stories, it is believed, are too hard to remember and are doomed to 
disappear. Moreover, in spreading rumors the transmitters make mistakes, in a par¬ 
tially predictable way. As a rule, errors make the message simpler and thus easier to 
retain. 

For instance, Barrett and Nihoff (2001) had participants learn fabricated sto¬ 
ries including some intuitive and some counter-intuitive elements. They found that 
slightly counterintuitive elements were better recalled than intuitive elements. In 
research along the same lines, Atran and Norenzayan (2004) found an inverse U- 
shaped curve between the proportion of counter-intuitive elements and the correct 
recall of stories. 

Two complementary concepts are prominent in the experimental psychology of 
cultural transmission. Learnability measures how easily an item (story, word, pat¬ 
tern) is remembered by someone. More learnable elements are more readily dissem¬ 
inated. They are selected in a “darwinian” manner, since learnability is a measure 
of how adapted the item is to the human community in which it will live or die. 

Complexity is another important property of cultural items. In general, complex 
items are less learnable, but there are exceptions. For instance, a story without any 
interest will not be learned, even if highly simple. Interest, unexpectedness or hu¬ 
moristic level also play a role in determining how learnable and also how “buzzy” 
an item is. 

When researchers study complex cultural items such as stories, arguments, or 
even theories or religions, they are bound to measure complexity and learnability 
by using more or less tailor-made tools. Typically, researchers use the number of 
elements in a story, as well as their interrelations to rate its complexity. The number 
of elements and relations someone usually retains is an index of learnability. 

The paradigm of iterative learning is an interesting way to study cultural trans¬ 
mission and investigate how it yields to structural transformations of the items to be 
transmitted. In the iterative learning design, a first participant has to learn a message 
(a story, a picture, a pseudo-language, etc.) and then reproduce or describe it to a 
second participant. The second participant will then describe the message to a third, 
etc. A chain of transmission can comprise tens of participants. 

Certainly, algorithmic complexity is not always usable in such situations. How¬ 
ever, in fundamental research in cognitive psychology, it is appropriate. Instead 
of investigating complex cultural elements such as rumors or pseudo-sciences, we 
turned to simpler items, in the hope of achieving a better understanding of how struc¬ 
ture emerges in language. Some pseudo-languages have been used | Kirby et al. 2008| 
and showed a decrease in intuitive complexity along the transmission chains. 
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In a more recent study, researchers used two-dimensional patterns of points fKempe et al. 2015] . 
Twelve tokens were randomly placed on 10 x 10 grids. The first participant had to 
memorize the pattern in 10 seconds, and then reproduce it on a blank grid, using 
new tokens. The second participant then had to reproduce the first participant’s re¬ 
sponse, and so on. Participants in a chain were sometimes all adults, and sometimes 
all children. Learnability can be defined as the number of correctly placed tokens. As 
expected, learnability continuously increases within each transmission chain, at the 
same rate for children and adults. As a consequence of increasing learnability, each 
chain converges toward a highly learnable pattern (which also is simple), different 
from one chain to the other. 

The fact that children’s and adults’ chains share the same rate of increase in 
learnability is striking, since they have different cognitive abilities..., but may be 
explained by complexity. Indeed, the algorithmic complexity of the patterns within 
each chain decreases continuously, but at a faster rate in children’s chains than adult 
chains. 

The Less-is-More effect jKersten & Earles 20011 in psycholinguistics designates 
the fact that babies, with low cognitive abilities, are far better than adults at learn¬ 
ing language. Thus, less cognitive ability could actually be better when it comes to 
learning language. One possible explanation of the paradox relies on complexity: 
as a consequence of the reduced cognitive power of babies, they could be bound 
to “compress the data” more—build a simpler though possibly faulty representa¬ 
tion of the available data they have access to. The recent study of Kempe et al. 

(2015) supports this idea (albeit remotely), by showing that children achieve the 
same increase in learnability as adults by a much quicker decrease in algorithmic 
complexity, which can be a consequence of reduced cognitive power. 


4 Concluding remarks 

We have surveyed most of what has formally been done to connect and explain cog¬ 
nition through computation. From Turing to Searle to Tononi to our own work. It 
may seem that a mathematical/computational framework is taken too far, for exam¬ 
ple in interpreting how humans may remember long sequences and that remember¬ 
ing such sequences may probably never be useful for any organism. But it is not the 
object in question (string) at the core of the argument but the mechanism. Learning 
by systematic incorporation of information from the environment is what ultimately 
a DNA sequence is, and what the evolution of brains ultimately allowed to speed 
up the understanding of an organism’s environment. Of course, the theoretical im¬ 
plications of a Turing machine model—such as halting, having an unbounded tape 
etc.—are only irrelevant if taken trivially as an analogy for biological processes, but 
this is missing the point. The point is not that organisms or biological processes are 
or may look like Turing machines, but that Turing machines are mechanistic expla¬ 
nations of behavior, and as shown by algorithmic probability, they are an optimal 
model for hypothesis testing of, in our opinion, the uttermost relevance for biologi- 
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cal reality and what we call the algorithmic cognition approach to cognitive sciences 
at the forefront of pattern recognition, and multisensory integration. 

It may also be believed that computational and algorithmic complexity focus 
only on classifying algorithmic problems according to their inherent difficulty or 
randomness and as such that it cannot be made equal to models of cognition. More¬ 
over, that algorithmic analysis can give meaningful results only for a relatively 
narrow class of algorithms that on one hand are not too general and on the other 
hand are not too complex and detailed and reflect the main ingredients of cog¬ 
nitive processes. These beliefs would only hold if the power and universality of 
the results of the field of algorithmic probability (the other side of the same coin 
of algorithmic complexity) as introduced by Solomonoff ISolomonoff 1964l l and 
Levin IlLevin 19741 were not understood, an introduction giving it proper credit, 
as the optimal theory of formal induction/inference, and therefore learning, can 
be found in liKirchherr et al. \99l\ . The fact that the theory is based on the Tur¬ 
ing machine model is irrelevant and cannot be used as an objection as shown, for 
example, by the Invariance Theorem ISolomonoff 19641 iLevin 19741 . which not 
only also applies to algorithmic probability but Levin showed that disregarding the 
computational model the algorithmic probability converges and the only univer¬ 
sal semi-measure dominates liKirchherr et al. 19971 . We have adopted this powerful 
theory and used it to advance the algorithmic cognition approach we have devised 


specific tools with a wide range of applications |Zenil and Villarreal-Zapata 2013 

IZenil 20101 that also conform to animal and human behavior llZenil 2013IIGauvrit et al. 2014al 


IGauvrit et al. 20 14bl IGauvrit et al. 2014cl Kempe et al. 2015 Mathy et al. 2014 IZenil et al. 20151 . 

Just as Tononi et al. made substantial progress in discussing an otherwise more 
difficult topic by connecting the concept of consciousness to information theory. We 
have offered what we think is an essential and what appears a necessary connection 
between the concept of cognition and algorithmic information theory. Indeed, within 
cognitive science, the study of working memory, of probabilistic reasoning, the 
emergence of structure in language, strategic response, and navigational behavior 
is cutting-edge research. In all these areas we have made contributions llZenil 20131 
IGauvrit et al. 2014al IGauvrit et al. 2014bl IGauvrit et al. 2014cl Kempe et al. 2015 


Mathy et al. 2014 IZenil et al. 20151 based upon algorithmic complexity as a use¬ 


ful normative tool, shedding light on mechanisms of cognitive processes. We have 
proceeded by taking the mind as a form of algorithmic compressor, for the reasons 
provided in this survey emerging from the simple fact that the mind operates other¬ 
wise than through a lookup table and the ways in which the mind manifests biases 
that seem not to be accounted for but by algorithmic probability based purely on 
computability theory. The research promises to generate even more insights into 
the fundamentals of human and animal cognition llZenil et al. 20151 . with cross¬ 
fertilization taking place from the artificial to the natural realms and viceversa, as 
algorithmic measures of the properties characterizing human and animal behavior 
can be of use in artificial systems like robotics | ZemUo appeaiL which amounts to 
a sort of reverse Turing test as described in llZenil 2014bl and l lZenil 2014al . 
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